Morphological Analysis for Statistical Machine Translation
نویسنده
چکیده
We present a novel morphological analysis technique which induces a morphological and syntactic symmetry between two languages with highly asymmetrical morphological structures to improve statistical machine translation qualities. The technique pre-supposes fine-grained segmentation of a word in the morphologically rich language into the sequence of prefix(es)-stem-suffix(es) and part-of-speech tagging of the parallel corpus. The algorithm identifies morphemes to be merged or deleted in the morphologically rich language to induce the desired morphological and syntactic symmetry. The technique improves Arabic-to-English translation qualities significantly when applied to IBM Model 1 and Phrase Translation Models trained on the training corpus size ranging from 3,500 to 3.3 million sentence pairs.
منابع مشابه
Myanmar Phrases Translation Model with Morphological Analysis for Statistical Myanmar to English Translation System
This paper presents Myanmar phrases translation model with morphological analysis. The system is based on statistical approach. In statistical machine translation, large amount of information is needed to guide the translation process. When small amount of training data is available, morphological analysis is needed especially for morphology rich language. Myanmar language is inflected language...
متن کاملA new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملThe tÜBITAK-UEKAE statistical machine translation system for IWSLT 2009
We describe our Arabic-to-English and Turkish-to-English machine translation systems that participated in the IWSLT 2009 evaluation campaign. Both systems are based on the Moses statistical machine translation toolkit, with added components to address the rich morphology of the source languages. Three different morphological approaches are investigated for Turkish. Our primary submission uses l...
متن کاملEleftherios Avramidis and Jonas Kuhn: Exploiting XLE's finite state interface in LFG-based statistical machine translation
We present the addition of a morphological generation component to an LFG-based Statistical Machine Translation System, taking advantage of existing morphological grammars and the FST (Finite State Transducer) processing pipeline of the XLE system. The extended syntax-driven translation system takes separate stochastic decisions for lemmata and morphological tags; the role of finite-state morph...
متن کاملStatistical Machine Translation of Australian Aboriginal Languages: Morphological Analysis with Languages of Differing Morphological Richness
Morphological analysis is often used during preprocessing in Statistical Machine Translation. Existing work suggests that the benefit would be greater for more highly inflected languages, although to our knowledge this has not been systematically tested on languages with comparable morphology. In this paper, two comparable languages with different amounts of inflection are tested, to see if the...
متن کامل